Detection of Documentary Scene Changes by Audio-Visual Fusion
نویسندگان
چکیده
The concept of a documentary scene was inferred from the audio-visual characteristics of certain documentary videos. It was observed that the amount of information from the visual component alone was not enough to convey a semantic context to most portions of these videos, but a joint observation of the visual component and the audio component conveyed a better semantic context. From the observations that we made on the video data, we generated an audio score and a visual score. We later generated a weighted audio-visual score within an interval and adaptively expanded or shrunk this interval until we found a local maximum score value. The video ultimately will be divided into a set of intervals that correspond to the documentary scenes in the video. After we obtained a set of documentary scenes, we made a check for any redundant detections.
منابع مشابه
IEEE Transactions on Multimedia EDICS: 4-SEGM Enhanced Eigen-audioframes for Audio-visual Scene Change Detection
In this paper, a novel audio-visual scene change detection algorithm is presented and evaluated experimentally. An enhanced set of eigen-audioframes is created that is related to an audio signal subspace, where audio background changes are easily discovered. An analysis is presented that justifies why this subspace favors scene change detection. Additionally, a novel process is developed in ord...
متن کاملRUCMM at MediaEval 2015 Affective Impact of Movies Task: Fusion of Audio and Visual Cues
This paper summarizes our efforts for the first time participation in the Violent Scene Detection subtask of the MediaEval 2015 Affective Impact of Movies Task. We build violent scene detectors using both audio and visual cues. In particular, the audio cue is represented by bag-of-audio-words with fisher vector encoding. The visual cue is exploited by extracting CNN features from video frames. ...
متن کاملScene Understanding through Audio-Visual Fusion
Scene understanding involves the integration of a wide variety of information to produce a through description of the robot's environment. By integrating spatial, visual and audio cues, we could provide a greater amount of understanding than can be obtained using one of the modalities alone. In this paper, we describe our current work on using audition to enhance existing object detection and t...
متن کاملA multimedia content modeling and classification methodology using visual information for the protection of sensitive user groups
The thesis concerns the problems of visual tracking and violence detection in video sequences. For the visual tracking problem, two feature fusion frameworks are presented. For violence detection, a system that classifies movie segments as violent or non-violent is proposed. The first tracking framework called ’Model Fusion via Proposal’ (MFP) framework, provides a way to efficiently fuse visua...
متن کاملMultimodal and ontology-based fusion approaches of audio and visual processing for violence detection in movies
In this paper we present our research results towards the detection of violent scenes in movies, employing advanced fusion methodologies, based on learning, knowledge representation and reasoning. Towards this goal, a multi-step approach is followed: initially, automated audio and visual analysis is performed to extract audio and visual cues. Then, two different fusion approaches are deployed: ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003